17 research outputs found

    A Fast and Scalable Graph Coloring Algorithm for Multi-core and Many-core Architectures

    Full text link
    Irregular computations on unstructured data are an important class of problems for parallel programming. Graph coloring is often an important preprocessing step, e.g. as a way to perform dependency analysis for safe parallel execution. The total run time of a coloring algorithm adds to the overall parallel overhead of the application whereas the number of colors used determines the amount of exposed parallelism. A fast and scalable coloring algorithm using as few colors as possible is vital for the overall parallel performance and scalability of many irregular applications that depend upon runtime dependency analysis. Catalyurek et al. have proposed a graph coloring algorithm which relies on speculative, local assignment of colors. In this paper we present an improved version which runs even more optimistically with less thread synchronization and reduced number of conflicts compared to Catalyurek et al.'s algorithm. We show that the new technique scales better on multi-core and many-core systems and performs up to 1.5x faster than its predecessor on graphs with high-degree vertices, while keeping the number of colors at the same near-optimal levels.Comment: To appear in the proceedings of Euro Par 201

    Diagrammes de puissance restreint sur le GPU

    Get PDF
    International audienceWe propose a method to simultaneously decompose a 3D object into power diagram cells and to integrate given functions in each of the obtained simple regions. We offer a novel, highly parallel algorithm that lends itself to an efficient GPU implementation. It is optimized for algorithms that need to compute many decompositions, for instance, centroidal Voronoi tesselation algorithms and incompressible fluid dynamics simulations. We propose an efficient solution that directly evaluates the integrals over every cell without computing the power diagram explicitly and without intersecting it with a tetrahedralization of the domain. Most computations are performed on the fly, without storing the power diagram. We manipulate a triangulation of the boundary of the domain (instead of tetrahedralizing the domain) to speed up the process. Moreover, the cells are treated independently one from another, making it possible to trivially scale up on a parallel architecture. Despite recent Voronoi diagram generation methods optimized for the GPU, computing integrals over restricted power diagrams still poses significant challenges; the restriction to a complex simulation domain is difficult and likely to be slow. It is not trivial to determine when a cell of a power diagram is completely computed, and the resulting integrals (e.g. the weighted Laplacian operator matrix) do not fit into fast (shared) GPU memory. We address all these issues and boost the performance of the state-of-the-art algorithms by a factor 2 to 3 for (unrestricted) Voronoi diagrams and a ×50 speed-up with respect to CPU implementations for restricted power diagrams. An essential ingredient to achieve this is our new scheduling strategy that allows us to treat each Voronoi/power diagram cell with optimal settings and to benefit from the fast memory

    Scaling Up Multiphysics

    No full text

    A Parallel Navier-Stokes Method and Grid Adapter with Hybrid Prismatic/Tetrahedral Grids

    No full text
    A parallel finite-volume method for the NavierStokes equations with adaptive hybrid prismatic / tetrahedral grids is presented and evaluated in terms of parallel performance. The solver is a central type differencing scheme with Lax-Wendroff marching in time. The grid adapter combines directional with isotropic local refinement of the prisms and tetrahedra. The hybrid solver, as well as the grid adapter are implemented on the Intel Paragon MIMD architecture. Reduction in execution time with increasing number of processors is close to linear. A parallel communication strategy is presented and the resulting communication times remain about the same with an increasing number of processors. Subdivision of the grids into subdomains is based on the co-ordinates of the cell centroids and different partitionings of the hybrid meshes are considered. The execution times for parallel solution of viscous flow around the HSCT configuration with hybrid grids are presented for different grid partit..

    Parallel Automated Adaptive Procedures for Unstructured Meshes

    No full text
    Contents 1. Introduction 2. Parallel Control of Evolving Meshes 2.1 Mesh Data Structure to Support Geometry-Based Automated Adaptive Analysis 2.2 Partition Communication and Mesh Migration 2.2.1 Requirements of PMDB and Related Efforts 2.2.2 Distributed Mesh Model and Notation Used 2.2.3 Data Structures 2.2.4 Mesh Migration 2.2.5 Scalability of Mesh Migration and Extensions 2.3 Dynamic Load Balancing of Adaptively Evolving Meshes 2.3.1 Geometry-Based Dynamic Balancing Procedures 2.3.2 Topologically-Based Dynamic Balancing Procedures 3. Parallel Automatic Mesh Generation 3.1 Introduction 3.2 Background and Meshing Approach 3.3 Sequential Region Meshing 3.3.1 Underlying Octree 3.3.2 Template Meshing of Interior Octants 3.3.3 Face Removal 3.4 Parallel Constructs Required 3.4.1 Octree and Mesh Data Structures 3.4.2 Multiple Octant Migration 3.4.3 Dynamic Repartitioning 3.5 Parallel Region Meshing 3.5.1 Underlying Octree 3.5.2 Template Meshing of Interior Octants 3.5.3 Face Remova
    corecore